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a.: 

We study a fingerprinting game in which the number of colluders and the collusion channel are 
unknown. The encoder embeds fingerprints into a host sequence and provides the decoder with the 
capability to trace back pirated copies to the colluders. 

Fingerprinting capacity has recently been derived as the limit value of a sequence of maximin games 
with mutual information as their payoff functions. However, these games generally do not admit saddle- 
point solutions and are very hard to solve numerically. Here under the so-called Boneh-Shaw marking 
assumption, we reformulate the capacity as the value of a single two-person zero-sum game, and show 
CN| ' that it is achieved by a saddle-point solution. 

If the maximal coalition size is k and the fingerprinting alphabet is binary, we show that capacity 
^sO . decays quadratically with k. Furthermore, we prove rigorously that the asymptotic capacity is 1/ (fc 2 2 In 2) 

and we confirm our earlier conjecture that Tardos' choice of the arcsine distribution asymptotically 
maximizes the mutual information payoff function while the interleaving attack minimizes it. Along 
with the asymptotics, numerical solutions to the game for small k are also presented. 
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I. Introduction 

In view of the ubiquity of digital media and the development of sophisticated piracy tools, it has 
become essential to develop a reliable protection scheme for copyrighted content. Digital fingerprinting, 
in which the content distributor embeds a uniquely identified fingerprint into each distributed copy, is an 
effective way to deter unauthorized redistribution of the content. 

Hundreds of years ago, people used the idea of fingerprinting in logarithm tables. Errors were added 
intentionally to insignificant decimals that are randomly selected, with each copy having a unique set of 
modifications. If someone ever sold his copy illegally, the legal authority could easily trace the guilty 
owner (pirate) by looking into the small errors. 

The authors are with the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 
Urbana, IL 61801 USA (e-mail: huang37@illinois.edu; moulin@ifp.uiuc.edu). This work was supported by the National Science 
Foundation (NSF) under grants CCF 06-35137 and CCF 07-29061. This work was presented at ISIT 2009 and WIFS 2010. 
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Digital content (e.g., images, videos, audios, programs, etc.) can be protected using the same idea. 
One common approach is to embed fingerprints using digital watermarking techniques. Similar to the 
logarithm tables discussed above, the fingerprints should not impair the quality or the functionality of 
the contents. Most watermarking systems are even robust against attacks such as compression, digital- 
to-analog conversions, or intentional noise adding. 

The most dangerous attack against fingerprinting, however, is a collusion attack. A group of experienced 
pirates can form a coalition, detect the fingerprints by inspecting the marks in each copy, and create a 
forgery that has only weak traces of their fingerprints. For the logarithm table example, if the errors are 
sparse and chosen randomly, the coalition can easily correct these errors by comparing several different 
copies of the logarithm table. Notice that since the pirates cannot remove the errors in which all their 
copies coincide, it is still possible for the distributor to design the marks so that at least one of the pirates 
can be caught (with possibly certain risk of falsely accusing someone). Yet it should be apparent that 
if the size of coalition is large, it is very hard to trace back to the fingerprinted copies from which the 
forgery was generated. 

To specify the type of manipulations the coalition is capable of, different models have been adopted 
in designing the collusion-resistant fingerprinting codes. The distortion constraint is a natural model for 
fingerprinting on multimedia contents 0], EL 0, El- In this work, however, we adopt another setup 
introduced by Boneh and Shaw in |5], called the marking assumption, which is commonly used both in 
multimdeia fingerprinting |6] and software fingerprinting [7]. Under this setup, the fingerprint sequence 
that each user receives is represented by a string of marks. By comparing their available copies, the 
colluders can modify the detected marks, but cannot modify those marks at which their copies agree. It 
should be noted that there exist several versions of the marking assumption specifying different strength 
of attacks the colluders can perform (8), and our analysis is general and applies to all these variants. 

A. Previous Work 

One of the first designs of fingerprinting codes that are resistant to collusion attacks is presented 
by Boneh and Shaw [5]. It was shown in |5] that a deterministic binary fingerprinting code with zero 
probability of decoding error does not exist. Hence, it becomes necessary for the construction of the 
fingerprinting codes to use some form of randomization, where the random key is shared only between 
the encoder and the decoder. They also provided the first example of codes with vanishing error probability. 

Tardos in 2003 |9 ] constructed fingerprinting codes of length at most 100/c 2 ln(m/e) for m users with 
error probability at most e against k pirates. This construction yields /c-secure fingerprinting schemes 
with e-error of rate [/c 2 100 ln(2/e)] \ The same paper gave an Q [k 2 log(l/e)) bound on the length of 
any fingerprinting code with the above parameters. The constant 100 in the length 100/c 2 ln(m/e) was 
subsequently improved by several papers iflOl . ifTTI . [HI, lfl2l . Amiri and Tardos recently lfl3l further 
improved the rate by constructing a code based on a two-person zero-sum game. 

A few researchers have also studied the problem from the information-theoretic point of view ifll. fill. 
0, ifTBl . Here the main objective is to find the maximum achievable rate, or capacity, of the fingerprinting 
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system. We denote capacity by Ck where k is the maximum coalition size. For the binary alphabet, Tardos' 
construction suggested that Ck > (/c 2 100 In 2) _1 . Anthapadmanabhan et al. [7] proved that Ck = 0(l/k). 
Recently, Moulin |4] provided the exact formula of capacity in a general setup that unifies the signal- 
distortion and Boneh-Shaw formulations of fingerprinting. The formula can be regarded as the limit value 
of a sequence of maximin games, which, however, is extremely difficult to evaluate in general. 

Two families of fingerprinting decoding scheme are also introduced in (4): simple decoding and joint 
decoding. The breakthrough of Tardos' randomized fingerprinting code in @ and its subsequent works 
belong to the class of simple decoders. It falls short of the capacity-achieving goal: reliable performance 
is impossible at code rates greater than some value c^ mple that is strictly less than the capacity Ck- Yet 
the simple and efficient algorithm makes it desirable for practical use. On the other hand, Amiri and 
Tardos' recent work |fT3l belonged to the joint decoding scheme. Although capacity-achieving, it is much 
more complex than the simple decoding scheme and is only useful when computation is not an issue. 
Another example of joint decoding is Dumer's work in lfl4l . where additional constraints are imposed 
in the analysis. 

B. Main Results 

Our work follows the Boneh-Shaw marking assumption. For both joint and simple decoding, we 
reformulate the maximum achievable rates of both schemes as the respective maximin values of two 
two-person zero-sum games. We further show that the maximin and minimax values of the games are 
always equal, and the values are achieved by saddle-point solutions. 

In the binary alphabet case, new capacity bounds are provided in closed-form expressions. The ratio 
between the upper and lower bounds of the joint decoding scheme is 7r 2 /2, while that of the simple 
decoding scheme is only 7r 2 /4 (for large k). These bounds not only show that the binary fingerprinting 
capacity is in Q(l/k 2 ), but they also provide secure strategies for both players of the game. Numerical 
solutions for small k are also presented in comparison with the bounds. 

Asymptotic analysis for large coalitions is based on a mild regularity assumption. When k is large, the 
fingerprinting game for joint decoding approximates a continuous-kernel game, whose optimal-achieving 
strategies can be solved explicitly as the arcsine distribution and the interleaving attack. Finally, we give 
a higher level interpretation from the standpoint of statistical decision theory. 

The outline of the paper is as follows: In Sec. [TTJ we introduce our fingerprinting model and formally 
define fingerprinting capacity. The capacity formulas derived in |4] are briefly reviewed and reformulated 
in Sec.|IIIl Sec.|lV]and Sec.|V]are devoted to the binary alphabet case and Sec.[VT]gives a brief summary. 

C. Notation 

We use capital letters to represent random variables, and lowercase letters to their realizations. Boldface 
denotes vectors, and calligraphic letters denote finite sets. For example, X £ X n denotes a random vector 
(Xi, . . . ,X n ), with each Xi taking values in X. The probability distribution of X is characterized by 
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its distribution function Px(x) = Pr(X\ < x\, . . . ,X n < x n ). If the distribution is discrete, we also 
describe it by its probability mass function (pmf) px(x) = Pr(X = x). Otherwise if Px has the form 



■■■ / x (x)dxi . . . dx n 

-oo J — oo 



then we characterize the distribution by its probability density function (pdf) /x- Mathematical expectation 
of a function g(X) with respect to Px is defined by 



E Px [( 5 (X)]^ | s(x)P(dx). 



lU(x) 



The mutual information of X and Y is denoted by I(X;Y) = - H(X\Y). Should the 

dependency on the underlying pmf's be explicit, we write the pmf's as subscripts, e.g. H px (X) and 
IpxPvix C^> ¥)• Given a pair of sequences (x, y), we denote by J(x; y) the empirical mutual information 
of the joint pmf p xy . We also denote the binary entropy function by h(p) = —plogp— (1 — p) log(l —p) 
and h(p) = (h(pi), . . . , h(p n ))'. The Kullback-Leibler divergence between two pmf's p and q is denoted 
by D(p || q), and the Kullback-Leibler divergence between two Bernoulli random variables with respective 
expectations p and q is denoted by d(p\\q) = p log | + (1 — p) log where log denotes base 2 logarithm 
and In denotes natural logarithm throughout the paper. 

Sequences are denoted by (•). The size or cardinality of a finite set A is denoted by \A\. The indicator 
function of a subset A of a set X is a function 1_4 : X — > {0, 1} defined as 

/ 1 if z G ^ 
\ if x £ .4 

The power set of a finite set denoted by 2 X , is the set of all subsets of X, including the empty set 
and X itself. The support of a probability distribution P, denoted by supp(P), is the smallest set whose 
complement has probability zero. The support of a family of probability distributions, denoted by 
supp(^), is the union of the support of each probability distribution in the family, i.e., ljp g ^ supp(P). 

Asymptotic notations are defined as follows: Suppose f(k) and g(k) are two functions defined on 
positive real numbers. We say f(k) = 0(g(k)) if 3ci > 0, k\ > such that f(k) < c\g(k),\/k > k\. 
Also, f{k) = n(g(k)) if 3c 2 >0,k 2 >0 such that f(k) > c 2 g{k),\/k > k 2 . We write f(k) = Q{g(k)) 
if f(k) = 0(g(k)) and f(k) = Q(g(k)). The expression f(k) = o{g(k)) or f(k) = uj(g(k)) means that 

f(k)/g(k) tends to or oo respectively. The shorthand / ~ g, f > g, and / < g denote the asymptotic 

f(k) f(k) f(k) 

relations lim^oo = 1, liminffc^oo > 1, and limsupi, ., . < 1 respectively. 

g(k) g{k) g{k) 

II. Fingerprinting codes and capacity 

A. Overview 

The model for our fingerprinting system is shown in Fig. [TJ Let X = {0, 1, . . . , q — 1} denote a size-g 
fingerprint alphabet and let M. = {1, . . . ,m} denote the set of user indices. An (n,m) fingerprinting 
code (e n ,d n ) over X consists of an encoder and a decoder. The encoder 

e n : M x V n -> X n (1) 
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Fig. 1. Fingerprinting system model under the marking assumption 

assigns user i a length-n fingerprint Xj, where V n is the alphabet of the secret key V n , which is a random 
variable whose realization is known to the encoder and the decoder, but unknown to the pirates. 

We denote by /C = {ii, . . . , %k\ the index set of a coalition of K pirates and X^; = {Xj : i G /C} 
as the fingerprints available to them. The collusion channel produces the forgery Y 6 y n according 
to some conditional probability mass function (pmf) Py|Xk- The Boneh-Shaw marking assumption is 
imposed on Py|x K ' which allows the colluders to change only the symbols at the positions where they 
find differences. 

Not knowing the actual collusion channel Pyix*;' tne decoder 

d n :yxV n ^2 M (2) 

produces an estimate K, of the coalition. Note that the actual number of pirates K is known neither to 
the encoder nor to the decoder, so the code design is based on a nominal coalition size k. Also note 
that the empty set is an admissible decoder output, which enables us not to accuse any user when no 
enough evidence is available to the decoder, especially when the actual K is larger than k. 

B. Randomized Fingerprinting Codes 

The formal definition of an ensemble of fingerprinting codes is as follows. 

Definition 1. A fingerprinting ensemble (E n ,D n ) is formed by the fingerprinting embedder randomly 
choosing from a family {e n (- , v n ) , d n (- , v n ) , v n £ V n } of (n,m) fingerprinting codes according to some 
probability distribution on the set V n of keys. 

We assume that the family of fingerprinting codes and the probability distribution on V n are known to 
the public, but the realization v n is only known to the encoder and the decoder. 
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As shown in [4], it suffices to consider the following two-phase fingerprinting construction and 
joint/simple decoding scheme in studying capacity. The secret key V n shared by the encoder and the 
decoder in this scheme is the set of random variables {Wj}™ =1 U (Xij) mxn . 

1) Encoding Scheme: Let Pw be a probability distribution on the (q — 1) -dimensional simplex 

W q = | w G R q : Wx = 1 and < w x < 1, x G x\ . (3) 

A sequence of auxiliary "time-sharing" random variables {Wj}™ =1 is drawn independent and identically 
from the distribution Pw- For each j G {l,...,n}, are m independent and identically 

distributed random variables constructed from a categorical distribution^ with parameter Wj, i.e., 

m 

Pr (X ltj = xi, . . . ,X mJ = x m \Wj = w) = Y[w Xz , Vxi, . . . , x m G X . (4) 

i=l 

In general, there is no constraint on the choice of the embedding distribution Pw, which means that 
we can choose it from the class of all probability distributions on W q , denoted by =^w- However, we 
may want to limit Pw to a subclass & e of &w in some applications. For instance, Nuida et al. |[T0l 
limited Pw to be discrete with a finite spectrum. Furon and Perez-Freire [6] studied the case when P\y 
is the arcsine distribution (defined below) or the uniform distribution over the unit interval for binary 
fingerprinting codes, in which & e is just a singleton. 

In most of our results we require & c to be compact. In some results we also require the following 
condition. Note that ^ e satisfying © is compact. 

Condition 1. ^ e coincides with the class of all probability distributions on supp(<^ e ), i.e., 

^ e = {P w G ^ W : supp(Pw) C supp(^ e )} . (5) 

Analogous to the symbol-symmetric fingerprinting codes proposed by Skoric et al. (H, it is intuitively 
reasonable to adopt a probability distribution Pw that is invariant to permutations of the symbols. 
Formally, let ir be a permutation of X and define 

P W Oo, • • • ,Wq-l) - Pw(uV(0)> ■ • • ^Tr(q-l))- (6) 

Then we have 

Definition 2. An embedding distribution Pw is symbol-symmetric if 

Pw = Pw, Vvr. (7) 
Definition 3. A subset of £P-w is said to be symbol-symmetric if 

P w G Vvr. (8) 

'The categorical distribution is a special case of the multinomial distribution with the number of trials set to 1. 
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Definition 4. The symbol-symmetric subclass of an embedding class £P C is defined by 

^sym = {Av G <^ 2>c : -Pw symbol-symmetric] . (9) 
The optimality of limiting P\v to ^| ym will be discussed in the next section. 

2) Decoding Scheme: We briefly review Moulin's two decoding schemes proposed in |4): the simple 
decoder tests candidate fingerprints one by one, while the joint decoder utilizes a joint decoding rule. 
The simple decoder evaluates the empirical mutual information I(xj;y|w) for each user i. A threshold 



rf 



simple 



is chosen and user i is accused if and only if J(xj;y|w) > ^ sim P lc . if /(xj;y|w) < ^ sim P le for 
all i S M, then K. = 0. The joint decoder evaluates the following score for each coalition AQ M 

S(A) = {°- ... « A = $ (10) 

[ I(x^;y|w) - 1.41^, otherwise 

where -rf oint is a threshold. The set A that has the largest score is then accused. With the parameters 
^simple an( j ^jomt^ ^ofti decoders allow to tune the trade-offs between false positive and false negative 
error probabilities. 

It is shown in |4] that the joint decoding scheme achieves capacity while the simple decoding scheme 
has a smaller maximum achievable rate. However, the computational complexity of a joint decoder is 
generally vastly greater than that of a simple decoder. 

C. Collusion Channel 

Upon receiving the K fingerprinting copies {Xj i; . . . ,Xj K }, the pirates attempt to generate a forgery 
Y subject to the marking assumption. A coordinate j is called undetectable if 

= %i2,j = • • • = Xi K j (11) 

and is called detectable otherwise. The Boneh-Shaw marking assumption states that for any forgery y 
generated by the coalition, we have t/j = Xi u j for every undetectable coordinate j. 
We assume the collusion channel Py\x. k adopted by the pirates is memoryless, that is, 

n 

PY|X K (y| x K:) = nPY\X K (Vj\ x Kj)' ( 12 ) 

As exploited in [4 ], the memoryless restriction can be relaxed without changing the fingerprinting capacity. 
For simplicity we impose this constraint so the colluders' strategy is limited to the choice of the single- 
lettered channel Vy\x k - Denoted by ^ mar k, the class of attacks satisfying the marking assumption can 
be written as 



mark 



{Py\x k ■ Py\x k (v\xk) = lify = x il =x i2 = --- = x iK }. (13) 



Several variants of the marking assumption appear in the literature. Suppose & c denotes the class 
of admissible channels Vy\x k - Some restrict the coalition to use only the symbols available to them, 
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where ^ c = {py\x k £ ^mark : Py\x k {v\ x k) = if y / Xj, Vz G /C} (usually called the restricted digit 
model). Others relax the alphabet y to X U {'?'}, which allows the pirates to put erasures at detectable 
coordinates (usually called the general digit model). Here we consider a general with C ^ mar k 
being compact. 

Given a single-lettered collusion channel Py\x k > consider the user-permuted collusion channel 

Py\x„ {K) (ykii ,•••,£**•)- Py 1x^(2/1^(1!), ■ • • , ^(i*:)) ( 14 ) 
where ir is a permutation of /C. We say that Py\x k is user- symmetric if 

Py|x x(K) =Py|x K , Vvr. (15) 
A subset ^ c of ^ mar k is said to be user-symmetric if 

Py \ Xk G ^ c ^ py\ x<k) G Vtt. (16) 

Note that in general not all elements of such £P C are user-symmetric. The user-symmetric subclass of 
a collusion class that consists of user-symmetric collusion channels is defined by 

^fair = {Vy\x,c G ^ c = Py\x,c is user-symmetric}. (17) 
Symbol-symmetry can also be defined for collusion channels. Let ir be a permutation of X and define 

Py|Xc(2/K> • • ■ ) - Py|x K (2/k(^i), • • .,ir(x iic )). (18) 

Then we have 

Definition 5. A collusion channel Py\x k is symbol-symmetric 

Py|x K =Py|x K , Vtt. (19) 
Definition 6. A subset & c of ^ mSLr k is said to be symbol-symmetric if 

PyiXk e^ c ^ PyiXk G Vvr. (20) 

Definition 7. 77ie symbol-symmetric subclass of a collusion class £P C is defined by 

^sym = {py\Xk: £ ^ : Vy\x k is symbol- symmetric} . (21) 

D. Error Probabilities and Capacity 

Under fingerprinting ensemble (E n ,D n ), nominal coalition size k (not necessarily equal to the true 
coalition size K), and collusion channel Py\x k > we consider the following error probabilities: 
• The probability of false positives (accusing an innocent user): 

P! P (E n , D n ,p Ylx>c ) = Pr (/C \ K + 0) . (22) 
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• The probability of failing to catch any single pirate: 

P e one (£n, D n , PY \x K ) = Pr (t n K, = 0) . (23) 

• The probability of failing to catch the full coalition: 

Pf(E n , D n ,p Ylx J = Pr (jC £ /C) . (24) 
The error probabilities above can be written explicitly as 

n „ I m \ 

P e (E n ,D n ,PY\x K ) = ^2 II / p w(dwj) I JJw^j J PY\x K {yj\^K^£ (25) 
x M ,yi=r \i=i / 

where the error event £ is given by £ FP = {d n (y, v n ) \ fC ^ 0}, <P one = {d n (y,?; n ) n /C = 0}, and 
£ al1 = {/C ^ e? n (y, f n )}, when P e is given by (1221 ). (|23l , and (1241 respectively. The worst-case error 
probability for a collusion class is given by 

P e ,k {En , Dn , & c ) = max max P e (E n , D n , p Y 1 x K ) • ( 26 ) 
|/C|<fc 

Having defined the error probabilities of the randomized fingerprinting scheme, we now define the 
notion of capacity. 

Definition 8. A rate R is achievable for embedding class & c , collusion channel £? c , and size-k coalitions 
under the detect-one criterion if there exists a sequence of fingerprinting ensembles (F n , G n ) generated 
by Pw G for m = \2 nR ] users such that both P^(E n ,D n , @> c ) and P°% e (E n ,D n , S^ c ) vanish as 
n tends to infinity. 

Definition 9. A rate R is achievable for embedding class & c , collusion channel & c , and size-k coalitions 
under the detect-all criterion if there exists a sequence of fingerprinting ensembles (F n ,G n ) generated 
by P w G ^ for m = \2 nR ] users such that both P^(E n , D n , g? c ) and P^ l k {E n , D n , 2? c ) vanish as 
n goes to infinity. 

Definition 10. Fingerprinting capacities C% ne (£P e , & c ) and C% ll (& e , £? c ) are the suprema of all 
achievable rates with respect to the detect-one and detect-all criteria, respectively. 

Remark 1. When the embedding class & e is a singleton {Pw} ° r the collusion class & c is a singleton 
{Py\x,c}> we denote the corresponding capacities as Ck(Pw, •) and Ck(-,pY\Xic) respectively, which is 
a slight abuse of notation. 

III. Mutual information games 

In this section we first review the mutual information games associated with both the joint and 
the simple decoding schemes in (4J. We show how these games can be simplified under the marking 
assumption, and we show the existence of saddle-point solutions. 
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A. Mutual Information Game for Joint Decoder 

We use the special symbol K to denote the set {1,2, ... ,k}, where k is the nominal coalition size 
introduced in Sec. III-AI To present the capacity formula, we first introduce the following setup: for a 
fixed embedding class £P e , let 

9>$ ± {pw E ^ e : |supp(pw)| < 1} (27) 

be the class of probability distributions with finite spectrum composed of no more than I points of the 
(q — 1) -dimensional simplex W q . A random variable W is drawn from some pw E an d {Xi}f =1 
are independent and identically distributed with categorical distribution with parameter W, i.e., 

k 

Px k |w(>k|w) = f|px|w(a:i| w ) (28) 

i=i 

where 

Px\w(x\w) = w x , xeX. 
The collusion class ^ c is the set of all feasible channels Py\x k - Let 

C&^fp* max min -I(X K ;Y\W). (29) 

pwe&f pvix k g^ c k 

The following theorem summarizes the main results of fingerprinting capacity proposed in [4] under the 
marking assumption. 

Theorem 1. Assume that Condition [7J is satisfied and & c is compact. Then 

1) Cf{^ e , @> c ) < C£ ne (^ e , @ c ). In particular, the detect-all fingerprinting capacity Cf{0> e , ^ mark ) 
under the marking assumption is zero. 

2) Suppose further that & c is user-symmetric, then 

Cl ac {& e , &> c ) = Cl ne {0> e , = Cf (&> c , ^ f c air ) = lim cj° int ' l {0> e , &> c ). (30) 

Theorem Q] states that, the detect-all capacity can never exceed the detect-one capacity, which is no 
surprise since it can only be harder for the decoder to detect all the pirates than to detect only one of 
the pirates. However, if the collusion channel is user-symmetric, which we can intuitively think of as 
the case when each colluder "contributes" the same number of samples to the forgery (hence the term 
"fair"), then the detect-one and the detect-all capacities are the same. 

Now since the detect-all capacity is null under the marking assumption, we will in the rest of the paper 
refer to the detect-one capacity C£ ne (^ e , denoted by C^ omt ( ! ^ 2 ' e , & c ), as the joint fingerprinting 
capacity for embedding class and collusion channel . 

In the game-theoretic point of view, C^ mt ' 1 is the maximin value of a two-person zero-sum game for 
each I. Observe that the sequence (C?i° mt,i )£i is nondecreasing since (^f)f^ zl is nondecreasing (i.e. 
0P\ C C • • • ). Thus the game can be interpreted as the following: the maximizer, the fingerprint 
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embedder, picks pw with an increasing flexibility in the support size, while the minimizer, the coali- 
tion, counters the embedder's choice for each I by minimizing the mutual information payoff function. 
Fingerprinting capacity is the limit value of the sequence of maximin games. 

However, the maximin game of (1291 ) is in general very difficult to solve even for small values of I 
since a saddle-point solution cannot be guaranteed. For the binary alphabet (q = 2) and I = 1, we can 
derive the maximin value as 

which is not achieved by a saddle-point solution when k > 2. Also, this is a very loose lower bound on 
(ji° mt ^cp w ^ ^ mark ) for large k comparing to the @(k~ 2 ) bound we will show in Sec. UV-Cl 

B. Mutual Information Game for Simple Decoder 

As mentioned in Sec. III-BI computationally joint decoding is too complex. Thus it is also interesting 
to study the maximum achievable rate of the simple decoding scheme. 

Theorem 2. Assume that Condition \J\ is satisfied and & c is compact and user-symmetric. Let 

C" imple ' Z (^ e ,^ c ) = max min I(Xi\Y\W) (32) 



for I > 1 and let 



Cfc imple (^ e , & c ) = lim C f mple ' l (^ e , &> c ). (33) 

Then all rates below ^ imple (^ e , &> c ) are achievable by the simple decoding scheme for embedding 
class & e , collusion channel and size-k coalitions under the detect-one criterion. 

Corollary 1. For satisfying Condition \J\ and compact and user-symmetric & c , we have 

Ci implc {&> e , < Cf int (^ c , @> c ). (34) 

Proof: See El. ■ 
Although we do not have a notion of capacity for the quantity c^ im P le ) it will be referred to as the 
"simple" fingerprinting capacity as opposed to the joint fingerprinting discussed in the previous subsection. 

C. Two-Person Zero-Sum Games of Fingerprinting Capacity 

To establish the desired saddle -point property, we first reformulate both the joint and the simple 
fingerprinting capacities as the respective values of the following two fingerprinting maximin games. 

Theorem 3. Assume that Condition \J\ is satisfied and & c is compact and user-symmetric. Then 

Ct oint (^ e ,^ c ) = max min -I(X K ;Y\W) (35) 
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and 

Proof- 
Let 

and let 



= max min Y|W). 



^( w '^k) = iW K (*K5 31W = w) 



A- 



/r ple (w,Pnx K ) = W,(^i y l w = w )- 

Then the payoff functions of (1351 ) and d36l ) become Ep w 



(36) 



/f nt (w, Py|x j 



(37) 

(38) 
and 



E Pl 

r f.i< 

■ k 



ir pie (w, PY]XK ) 



respectively. Denote also the right-hand sides of (1351 ) and (1361 ) by 



C{°' mt (&> e , &> c ) and cl imple (& e , &> c ) respectively. The inequality 



(39) 



follows directly from the fact that &f C £P e for each I. Now let the optimal achieving distributions for 



d35l) or (1361 ) be P-^ and Pyi_y K . Then by completeness of £P e , there exists a sequence of distributions 
(Pw)£i w i tn Pw ^ tnat converges in distribution to f w • Both the functions I^° mt and /^ imple are 
bounded and continuous with respect to w. By [15, p.249 Theorem 1] we have 



lim E 

l— >oo 



Ep(fc) 



4(w, P ^ K ; 



(40) 



(41) 



and thus 

C fe (^ e , > lim E pW [j*(W,p?L) = C fc (^ e , ^ c 

Combining ([39]) and <@J) yields (fj5) and (fjgj. ■ 
Theorem [3] shows that the joint and simple capacities are the maximin values of two single two-person 
zero-sum games. Note that the theorem only specifies the capacities when the embedding class ^ c 
satisfies Condition Q] With slight modification of the proofs in H, it can be shown that (I35T ) and (l36l ) 
still hold for any compact & e . Furthermore, we can show that the maximin and minimax values of the 
games are equal in general, and there are always saddle-point strategies for both players of the games. 
We define the minimax values associated with the above games: 



Definition 11. The minimax value of the joint fingerprinting game is defined by 



ci° int (^ e 



min max E p 



/f nt (w, Py|XK ; 



Definition 12. The minimax value of the simple fingerprinting game is defined by 



min max E p v 



/r ple (w, Py|XK ) 



(42) 



(43) 
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When Condition Q] is satisfied (for example, when 0P e = <$^w)> the minimax games can be simplified 
as the following: 

Lemma 1. Assume that Condition UJ is satisfied. Then the minimax values of the joint and simple 
fingerprinting games can be respectively written as 

CrV e ,^ C )= mm max 4 oint (w,p y|XK ) (44) 

Pi-ix K ea Zc wesupp(# c ) 



^V 6 , = min max C^K^)- (45) 

Pi-|x K e^j air wGsupp(^°) 



and 



Proof: Note that randomization is no longer necessary for the minimax games of (1421 ) and (I43I ). so 
they have the same respective values as (1441) and ( |45T ). ■ 
We now present the main saddle -point property of the fingerprinting games. The results owe to the 
convexity of the payoff function with respect to the minimizer's strategy. Such games are generally called 
convex games [|T6j §2.5]. 

Theorem 4. For compact & e , compact and user-symmetric & c , and for both the joint and simple games, 
Ck{^ e , & c ) = Ck(^ e , J^ c ). Suppose further that & e and are symbol-symmetric, then the first 
argument & e can be replaced by ^| ym and/or the second argument can be replaced by ^f air , ^s ym , 
or ^ without changing the minimax or the maximin value. For all these games, the minimizer 

' (k) (k) 

has an optimal strategy P Y \x K e ^fair sym while me maximizer has an optimal strategy £ ^tym- 
In particular, when Condition [JJ is satisfied, the maximizing strategy p^J S &sym nas a finite spectrum. 
The values of all these games equal the (joint or simple) fingerprinting capacity Ck(& e , & c ). 

Proof: We show that the functions lj° mt an d J^ imple are convex functions of Py\x k f° r fixed w. The 
convexity of /jj° mt is shown in ifTTl Theorem 2.7.4]. To show the convexity of I^ mp e (yv,PY\x K )< we fix 
w and consider two different conditional distributions P Y \x K an ^ Py\x k - Note that 

/r plc (w,py,x K ) = /(x i; y|w = w) 



PY\x x w[y\x, 



= y2pX 1 \w{x\^)pY\X 1 w{y\x,w)log 

w PY\wty\w) 

= ^2px 1 \w{x\™)D{py\x 1 w II Py|w|W = w). (46) 

X 

For any < A < 1, we have 

A/ sim P lc (w,Pr|x K ) + (1 - A)/ sim P le (w,^ |XK ) 

= J^PX^wixl™) AZ>(p y | XiW || Py| W |W = W) + (1 - A)D(p^| XlW || Py| W |W = w) 

X 

> J^PXiiw^KJU^jXtW II Py\w\ w = w ) 

x 

= / sim P lc (w,Pyix K ) (47) 
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where P Y \x K = ^Py\x k + — ^)Py\x k e ^ C by com P actness - The inequality follows from the convexity 
of relative entropy ifTTl Theorem 2.7.2]. Hence /^ implc i s convex in Py\x K - 

Now since Ik(w,PY\x K ) is a convex function of Py\x k f° r fixed w £ ^' <? , Ep w [ifc(W,pyijy K )] is also 
a convex function of Py\x k f° r fixed -Pw £ ^w- On the other hand, Ep w [Ik(W ,p Y \x K )] lS a linear 
function of Pw for fixed Py\x k - By the minimax theorem lfl"8l . the game admits a saddle-point solution. 

If is symbol-symmetric and let Pw be a minimizing saddle-point strategy, then by symbol- 
symmetry each P w £ ^ e is a minimizing saddle -point strategy for any permutation ir of X. The 
symbol-permutation averaged distribution 

F w = 4E P w (48) 

' 7T 

is also a minimizing saddle-point strategy and is symbol-symmetric by construction. Similarly if <^ c is 
user-symmetric and symbol-symmetric, we can construct a maximizing saddle-point strategy that is both 
user-symmetric and symbol-symmetric. 

Finally if 3? c is the class of all probability distributions on supp(^ e ) (Condition [Tj, the game becomes 
a so-called convex game whose minimizing strategy has a finite spectrum (see HH §2.5]). ■ 

IV. Fingerprinting capacity for the binary alphabet 

In the following two sections we study intensively the joint and simple fingerprinting games for the 
binary alphabet, i.e. X = y = {0, 1}% Tight upper and lower bounds on capacities are provided under 
several different setups. 

A. Game Definition 

The mutual information games for joint and simple decoder in the binary case can be simplified as 
follows: 

1) Fingerprinting Codes 

The auxiliary random vector W now has only one degree of freedom, and we redefine it as 
W £ [0, 1]. Pw denotes its distribution and px\w ~ Bernoulli (W). 

Suppose & e is compact and symbol-symmetric. Then by TheoremHl it suffices to consider symbol- 
symmetric Pw, which in the binary case means that the distribution of W is symmetric about 1/2, 
i.e., 

Pr(W <w) = Pr(W >l-w), we [0, 1]. (49) 

In the numerical results in Sec. IV-B1 we will consider a subset of the family of beta distributions, 
which is a family of continuous probability distributions defined on (0, 1): 

2 ln the case of the binary alphabet, the four variations of the marking assumption discussed in |8| are equivalent in terms of 
capacity. Hence for simplicity we assume X = y. 
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where the beta function, B{6, 9) = ft [t(l - t)] ' 1 dt, appears as a normalization constant, and 
the parameter 6 > 0. The arcsine distribution, which is a special case of the beta distribution with 
(9 = 1/2, has pdf 

t w (w) = - 1 (51) 
ny/w{l — W) 

on (0, 1). The arcsine distribution was first used in generating randomized fingerprinting codes by 
Tardos [9] and is sometimes referred to as the "Tardos distribution" in the literature. 

2) Collusion Channel 

Suppose 8? c is compact and user- and symbol-symmetric. Then by Theorem[4]it suffices to consider 
user-symmetric attacks. Let Z = Y2i=i^-i e {0, l,-- - ,k}, which is the number of l's in X«. 
User-symmetry makes Z a sufficient statistic in producing Y. If we let p = (po, . . . ,Pfc)' where 
Pz — Py\z(M z )i z = 0, . . . j fc, then the collusion channel can be completely characterized by p. 
The marking assumption enforces that 

po = and p*. = 1. (52) 

On the other hand, symbol-symmetry allows us to consider p with 

Pz = l-Pk-z, z = 0,...,k. (53) 

The interleaving attack p* (a.k.a. "uniform channel" in Q and "blind colluders" in (T9"]) defined by 

p* = |, z = 0,...,k (54) 

is frequently adopted to model the coalition's strategy and can be easily implemented by drawing 
each yj randomly from x±j, . . . ,Xkj at each position j. One can verify that it satisfies the 
marking assumption d52~l ) and is both user- and symbol-symmetric ( |49l . We will further discuss the 
performance of this attack in Sec. IIV-CI and Sec. [V] 

3) Payoff Functions 

Let ot(w) = (ao(w), . . . ,a>k{w))' and similarily for ^(w) and a°(w) where 

a z {w)±p m {z\w) = (*)w*(l-w) k - z (55) 
is the binomial law with parameter w and k trials, and 

i, w , , x i ( k '})w z - 1 (l-w) k - z , Kz<k 

[0, z = 

and 

n/ x i f (^^(l-to)*-*- 1 , $<z<k-l 

I Uj Z — K 

are the (shifted for cx x {w)) binomial laws with parameter w and k — 1 trials. 
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Recall that W — > — )■ Z — > Y forms a Markov chain. We have 

k 

Py\x x w{ 1 \x,w) = ^2pz\x 1 w(z\x,w)py\ z (1\z) 



2=0 

k 



^2a x z (w)p z = a x 'p 



2=0 



for x = 0, 1. The payoff function for the joint fingerprinting game is then 



(58) 



/f m >,p) = l I(X K] Y\W = w 



-I(Z;Y\W = w) 

- [H(Y\W = w)- H(Y\Z, W = w)] 
k 



KZ = 



2=0 



- [h(a'p) - a'h(p)] . 



Another representation of /^ oint is 



P° mt (w,p) = t D (PZY\W II PZ\WPY\w\W = W) 



k 1 



T J2PZ\w(z\w)p Y \z(y\z) log 



2 =0 y=0 
1 k 

2 = 

k 



Py\z(vV 

PY\w(y\'> 



w 



Pz log — — + (1 - Pz) log — 

a'p 1 — orp 



2 = 



For the simple fingerprinting game, we have 

ir ple (w,p) = I(X 1 ;Y\W = w) 



D ipx 1 Y\w II Px 1 \wPy\w\W = w) 



l i 



/^2^2Px 1 \w{x\w)p Y \X 1 w(y\ x , w ) log PYlXlW 



\x,w) 



x=0y=0 PY\w(y\w) 

wD(p Y \x 1= i,w II Py\w) + (1 - ^) £, (Py|x 1 =o,w/ II Py|tr) 



(o) 



lu^a^'p II a'p) + (1 — w)d(a°'p \\ a'p 



a' 



(59) 



(60) 



(61) 



where (a) follows from 
4) Fingerprinting Games 
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The fingerprinting games for the binary alphabet under the marking assumption can now be written 

as 

C k (& c ,^ c ) = max min E Pw [I k (W,p)} (62) 

Pw P 

= min max E Pw [I k (W,p)} (63) 

P Pw 

where the maximization is subject to P\y S ^ e and the symbol-symmetry condition (l49l) while 
the minimization is subject to p £ 0P C and the symbol-symmetry condition (l53l . The maximizing 
and minimizing strategies are denoted by P$ and p( fe ) respectively. If satisfies Condition [Q 
then by Lemma Q] we have 

C fc (^ e ,^ c ) = min max I k (w,p) (64) 

p w 

where the maximization is subject towG supp(^ e ). 

B. Analysis of the Convex Games 

We consider the following three cases: 

1) Colluders' Strategy is Fixed 

J^ c = {p} in this case. For general J^ e the game is still an infinite-dimensional maximization 
problem. However when Condition Q] is satisfied, it reduces to one-dimensional by (l64l and a 
simple line search gives us the capacity under collusion channel p. Note that for any p £ <$^ m ark> 
C k (^ e ,p) is an upper bound on C k (^ e , ^ ma rk)- 

2) Fingerprinting Embedder's Strategy is Fixed 

& e = {Pw} in this case. The game reduces to a ©(fc)-dimentional minimization problem. Since the 
payoff function Ep w [Ifc(W, p)] is convex in p, we use the conditional gradient method to solve the 
constrained convex optimization problem (l62l) (see [20]). For the joint fingerprinting game, Furon 
and Perez-Freire [6] proposed a Blahut-Arimoto algorithm which, however, cannot be applied to 
the simple fingerprinting game. Note that for any Pyv £ C k {Py\f , ^* e ) is a lower bound on 

C k (& w ,<? c ). 

3) Fingerprinting Capacities Under the Marking Assumption 

We consider specifically & e = &yv and ^ c = ^ mar k- Solving the maximin game of d62l ) or the 
minimax game of (l64b is much more difficult than solving the above maximization or minimization 
problems. In particular, the alternating maximization and minimization algorithm generally diverges. 
Owing to the existence of a saddle -point solution, pW and p$ (note that it is a pmf by Theorem 
ID) must satisfy the following: 

a) When p = p( fc ) is fixed, I(w,p^) is a differentiable function over the unit interval. The 
support supp (pyv ) of can only take values at the maximizers of I{w, p( fc )) fl6l §2.5]. 
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10 



20 



37 



10 



17 



(a) Joint 



(b) Simple 



Fig. 2. Capacities Ck(&w, ^mark), Ck(^w,P*), Ck(fw> ^mark), and upper and lower bounds of the (a) joint and (b) 
simple fingerprinting games 



Hence we have 



/Kp( fe )) = C fc (^,^mark) 

= 



e supp 



(65) 



b) When pyy = % is fixed and the constraint d53l ) is imposed, we have 



dp 2 



I(W,p 



(fc). 



0. 



- 1 



(66) 



(fc) 

supp I p K w ' 



< 



2 . 



(see US §2.5]). 

(fc) 



By the convexity in p of the payoff function, we have 

With a fixed spectrum cardinality, we can obtain candidate capacity-achieving strategies p^J and 
p( fc ) by solving d65l ) and d66l ), and then verify whether those candidate distributions are optimal 
by examining the second partial derivatives. Once p( fc ) and py) are found, we can evaluate 
Cfc(=^Wi ^mark) by substituting them into ( [621 . 
Numerical solutions to the joint and simple fingerprinting games are shown in Fig. |2]|4] Observe that 
capacities for both games (Fig. I2ta)-(b)), the optimal distributions Pyy for both games (Fig. [2a)-(b)), and 
the optimal attacks p( fe ) for the joint fingerprinting game (Fig. Ua)) all seem to converge as k grows. The 
optimal attacks p( fc ) for the simple fingerprinting game (Fig. @Ib)) however exhibit some wild oscillations 
in both amplitude and frequency as k grows. We will study the asymptotics of the joint fingerprinting 
game for large k in the next section. 



C. Capacity Bounds 

The analysis of Sec. UV-Bl allows us to solve the fingerprinting game numerically for small k. However, 
evaluating or even approximating the capacity value for large k is still a difficult task. In this subsection, 
we provide tight upper and lower bounds on capacity. 
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,(5) 
,(10) 
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(a) Joint 



0.8 



0< 



0.2 0.4 0.6 

z/k 

(b) Simple 



0.8 



Fig. 4. The interleaving attack p* and the optimal attacks p^ k ' for the (a) joint and (b) simple fingerprinting games under the 
marking assumption 

For simplicity of notation, we let 

k 

g k {w) = py\w(1\w) = ^ a z (w)p z = a(w)'p (67) 

2=0 

which by the definition of a z (w) in (1551 ) is a polynomial in w of degree < k. Note that <?fc(0) = po = 
and gk{l) = Pk = 1 by the marking assumption. 
The following lemmas will be useful for the proofs: 



Lemma 2 (Pinsker's inequality). / f77] Lemma 11.6.1] 



Lemma 3. Equalities 



d ( r II ») > ~ s) 2 . 



ol p-ap = — : — g k (w) 



(68) 
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and 



o' / 
Ot p a p 



9k( w ) 



hold for z = 0, . . . , k and w G [0, 1]. 

Proof: The equalities follow directly from ([55]>, ((56]), ([57]), and (l67l) . 



Lemma 4. Lef £>e a pdf on [0, 1]. 77ie?i 

fi 



dw 



> 7T Z 



/o /vK^M 1 - w) 
with equality if and only if fw is the arcsine distribution of ([57 



(69) 



Proof: By the Cauchy-Schwarz inequality, we have 



dw 



> 



1 dw 



y/w(l-w) 



7T 



o f w (w)w(l-w) f w (w)dw 

Equality holds if and only if fw(w) oc —j====, which leads us to the arcsine distribution. ■ 
1 ) Upper Bounds: The following two theorems bound from above capacities under the interleaving 



attack of (|54| ). 

Theorem 5. [21. Theorem 4.2] 



<rw> £ wsrr 



Proof: 



r ioint ( * 

= max li u>,p 

we[o,i] k 

= — max < n(w) - 
k we[o,i] 



E«.w*(i)} 

z=0 ,1 



(6) 
< 



1 



A; 2 In 2 

where (a) follows from d59l and (b) results from [7, Theorem 4.3]. 



(70) 



Theorem 6. /22. Proposition 4.2] 



a 



simple 



1 



/c 2 21n2 



+ o 



Proof: It can be shown that J^ imple (ii;, p*) takes its maximum at w = 1/2 (See the Appendix). 
Hence 



(71) 



cr pie (^v )P *) 



simple 



max i 

we[o,i] 



1 



/c 2 21n2 



+ o 
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2) Lower Bounds: The following theorem provides a lower bound on both the joint and simple 
capacities under a continuous probability distribution fw- 



Theorem 7. Let fw be the pdf of a continuous probability distribution on [0,1]. Then 



C{ ' mt (fw, ^mark) > (fw, ^mark) > 



k 2 In 2 



dw 



Jo fw(w)w(l -w) 



(72) 



The lower bound is maximized when fw = f w where it takes the value 
Proof: For any p £ ^ mar k, we have 



k 2 n 2 In 2' 



E 



fw 



simple 



(a) 














JO 




(b) 


2 




> 








m~2 






2 





(J) 

> 



A; 2 In 2 
2 



^(a^'p — a'p) 2 + (1 — w)(a°'p — a'p) 2 fw(w)dw 

Wk( w )] 2w ( l ~ w)f w (w)dw 
Io9' k (w)dw 



k 2 In 2 



Jo 



—w) 

dw 



/c 2 ln2 \_J fw(w)w(l — w) 
(a) follows from (loTl ). (b) follows from Pinsker's inequality (Lemma [2]). (c) follows from Lemma [3] (d) 
follows from the Cauchy-Schwarz inequality. Finally, (e) follows from the marking assumption. Hence 



C\ 



simple 



marki 



min E t 



> 



r ple (w, P ) 

dw 

o fw(w)w (1 - w) 



-i 



k 2 In 2 

Following Lemma 01 the lower bound is maximized when fw = fw, which coincides with the lower 
bound given in 11221 . ■ 
The following corollaries summarizes the upper and lower bounds on capacities under the marking 
assumption: 



Corollary 2. 



Corollary 3. 



2 <Cf nt (^,^ mark )< ' 



A; 2 tt 2 In 2 



£; 2 7r 2 In 2 



<Cr PlC (^,^mark)< 



k 2 In 2" 



1 



fc 2 21n2 



+ o „ . 



(73) 



(74) 
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V. ASYMPTOTICS FOR LARGE COALITIONS 

The upper and lower bounds on C ] £ int (&w, ^mark) provided in the previous section are within a 
factor of about five. As can be seen in Fig. [2] the numerical results suggest that C 3 ™ n (&w, ^mark) 
approximates (k 2 2 In 2) _1 even for small values of k. Amiri and Tardos lfT3l claimed the same asymptotic 
rate but only provided the proof for the lower bound in ll23l Theorem 15]. Here we analyze not only this 
rate but the complete asymptotics of the joint fingerprinting game. 

A. Aymptotic Analysis 

We consider the sequence of mutual information games for joint decoding. To study the asymptotics 
when k — > oo, we first assume that the collusion channel p satisfies the following regularity condition: 

Condition 2. There exists a bounded and twice differentiable function g(w) on [0, 1] with g(0) = and 
g(l) = 1 such that 

Pz = g{£), V*e{o,...,fc}. (75) 

Certainly the condition restricts the colluders' strategy to a smaller space. We however claim that this 
is a very mild limitation on their power for the following reasons: 

1) For each k, the collusion channels take values of g at only the lattice points in [0, 1], hence intuitively 
the class of collusion channels satisfying Condition [2] remains large. 

2) Fig. [5] shows the minimizing collusion channels for several different embedding distributions. 
For each case it seems the continuous interpolation of p does converge to some g on [0, 1]. Indeed, 
our following analysis still holds if we relax the restriction of ( 1751 ) to 

Pz = 9 (l) + ° {I) ' WG ™ 
The following reparameterization of the class (|75T > of collusion channels will simplify our analysis: 

Definition 13. Let G and J be functions on [0, 1] defined as 

G(w) =cos' 1 [l-2g(w)} (77) 

and 

J(w) =w(l-w)[G'(w)} 2 (78) 

where g(w) satisfies Condition \2\ 

The outline of our asymptotic analysis is as follows: we fix w G (0, 1) and we study the asymptotics of 
P£ mt (w, p). The binomial distribution of Z can be approximated by the Gaussian distribution with mean 
kw and variance kw(l — w), and by which we can approximate the dominating terms of P£ mt (w,p). 
Theorem [8] yields P^ mt (w, p) ~ J {w)/{k 2 '2 In 2), where J is the transformation of g defined in (I78T ). 
The maximin game with J as the payoff function can be solved explicitly and hence the asymptotics of 
the fingerprinting game can be obtained. 
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The following lemmas will be useful for our analysis: 
Lemma 5. /EH Sec. 2.5] 



» i '' ) ' s(1 '-';L +o '' r -''' ) ' ™ 

Lemma 6. For Z ~ Binomial(A;, w), we have 

Pr[\Z - kw\ > Vkhik] < l/k 2 . (80) 



Proof: This is special case of Hoeffding's inequality (see ESI ). ■ 
Recall that the expectation of Y given W = w, which we denote by gk{w) in (l67l ). can be written as 

k k 

9k(w) = p Y \wO-\ w ) = ^ a z{ w )Pz = X a *( w )9 (t) 
z=0 z=0 

where 

a z {w) = p z \w(z\w) = (^jw z (l - w) k - z 

is the binomial pmf which concentrates around its mean kw as k — > oo. is a polynomial in u; of 
degree < k and is known as the Bernstein polynomial of order k of the function g E6l . By Condition 
[2]g is bounded and the second derivative g"(w) exists, from Bernstein (26l §1.6] we have 

3tM=flM+ \ k > g"(w) + ol-\. (81) 
On the other hand, for z = k[w + e), we have 



Pz = g (|j = + e) 



= 5 H + e</H + O (e 2 ) . (82) 
We now write the asymptotic approximation of I k ° int in terms of g. 

Firstly by the bounds presented in Sec. UV-Cl we can focus on w and g such that I 3 £ m (w, p) = Vt{l/k 2 ). 
Let 5 = y/lnk/k. We have 

1 k 

P° mt (w,p) = -^2a z (w) d(p z \\ g h (w)) 

z=0 

~ ^ X a x (w) d(p^ || fffcO)) (83) 

z:\z- kw\<k8 

where (a) follows from (f60l > and ( f67T ) and (b) from Lemma [6l 



Now if we let z = kw + r\, where rj = 0(Vklnk), then by (l82l) we have 

p z = g(w) + ^g'(w) + o(jp\ (84) 

and combining with (f8Tb we have 

^-^H = ?5'H + of^V (85) 
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for r\ = oj(1). The contribution to (1831 ) for rj = O(l) decays exponentially with k and thus can be 
neglected. 

By (l85l ) and Lemma [5J we have 

[p z -5fc(^)] 2 (\ 



d{p z || fffcO)) 
and hence (1831 ) yields 

r ioint / \ ( a ) 1 

4 W P) 



+ 



g k {w){l - g k {w))2h\2 

[Pz - 9k(w)} 2 



(86) 



k2 In 2 



J] a 2 (u>) 



(&) 

(c) 

(d) 



z:\z- kw\<kS 



gk(w)(l - gk{w)) 



k 3 g(w)(l -g(w))2\n2 

[g'(w)] 2 w(l — w) 
k 2 g(w)(l -g(w))2ln2 



a z (w) (z — kwY 



z:\z- kw\<k8 



1 



-J(w) 



(87) 



/fc 2 21n2 

where (a) follows from (l86l ). (b) from (f8Tb and (l85l ). (c) from Lemma [6l and (d) directly from the 
definitions in d771 ) and d78l ). The following theorem concludes what we have proved thus far: 



Theorem 8. Assume that Condition [2] is satisfied, then 



if n Vp) 



i 



/c 2 21n2 



J(w), Vu>G(0,l). 



(88) 



The joint fingerprinting game of (1641 ) can now be approximated by the game with J as its payoff 
function. We consider continuous probability distributions f\y satisfying the following condition: 



Condition 3. The pdf fw is continuous on [0, 1] with 



dw 



< oo. 



lo fw{w)w{\ - w) 

The following lemma shows the solution to the minimization problem with J as its payoff fuction. 



(89) 



Lemma 7. Let g(w) satisfy Condition \2\ and fix fw satisfying Condition^ Then 



minE f [J(W)1 = min f J(w)f w (w)dw = ir 2 
9 9 Jo 



dw 



o f w (w)w(l - w)_ 



where J is defined in ( \78\l . The minimum is achieved by 



9opt{w) 



TT 



1 — COS 



f 



dv 



f w (v)v(l-v) 



dv 



f w (v)v(l-v) 



(90) 



(91) 
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Proof: We have 



J(w)fw(w)dw 



(a) 
> 



w(l — w)[G' (w)] 2 fw(w)dw 



fiG'(w)dw 



lo w(l 



dw 



(l-w)f w (w) 



= 7T 



dw 



.Jo Jw(w)w{l-w)_ 

where (a) follows from the Cauchy-Schwarz inequality and (b) follows from the boundary conditions 
G(0) = and G(l) = ir following directly from Condition [2] and the definition of (I77T ). Equality holds 
in (a) when 

GoptH = ~ , , A r , (93) 



r 1 

JO fw(v 



dv 

)v(l-v) 



f w (w)w{l - w) 



which leads us to (HB by (1771) . 

Corollary 4. For fw satisfying Condition \3] we have 

TT 2 



dw 



o fw(w)w(l - w)_ 



(94) 



Proof: This follows directly from Theorem [8] and Lemma UJ 
Corollary 5. 



^joint^y* 







1 



(95) 



fe marie/ ■- #2^2' 

Proof: The right-hand side of d94l is maximized when = by Lemma [4] Also note that by 
d9lT ) we have <7 op t(iw) = u>, which leads us to the interleaving attack. ■ 

Corollary 6. The interleaving attack is an "equalizing strategy " for the colluders that makes the payoff 
function J(w) asymptotically independent of w: 

1 



P, om \w,p*) 



Vw G (0,1). 



Proof: Let g(w) 



k ^'^ ; " fc 2 21n2' 
w. Then p becomes the interleaving attack by (1751 ) and J(w) 



(96) 
1 by Definition 



El 



Corollary 7. 77ie fingerprinting capacity under the marking assumption satisfies 



Cr\& w , ^ mark ) ~ (97) 

Furthermore, the arcsine distribution and the interleaving attack p* are f/ie respective maximizing 
and minimizing strategies that achieve the asymptotic capacity value. 



1 



Proof: The asymptotic relations d95l ) and (1961 ) establish matching asymptotic lower and upper bounds 
on C{ oint (^ w , ^ m ark) respectively. ■ 
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9 = 1/3 9 = 1/2 9 = 2/3 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

z/k z/k z/k 

Fig. 5. g opt (z/k) and minimizing collusion channels p*- fc ' for k = 10, 20, and 30 and Pw = Beta(0, S) 



9 = 1/3 9 = 1/2 9 = 2/3 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Fig. 6. Jopt(iu) and normalized payoff function It(w,p for k = 10, 30, and 50 and Pw = Beta(0, 0) 

B. Numerical Results for Beta Distributions 

We now use the family of Beta distributions to illustrate the asymptotics of the joint fingerprinting 
game. Let be the pdf defined in d50l ). Condition [3] is satisfied for any 9 E (0, 1). For 6 = 1/3, 1/2, and 
2/3, we find the minimizing collusion channels p'*' for and compare them with g op t(z/k) obtained 
by d9TT >. Fig. [5] shows that does converge to g op t{z/k) as k — > oo as expected, which also rationalizes 
our assumption of Condition |2] 

Consider the normalized payoff function Ji oin (w, p) = fc 2 21n2 • P™ n (w,p), which by Theorem [8] is 
asymptotically close to J(w). Suppose J opt is obtained by substituting g opt of d9TT) into ( T77T ) and ( |78T ). 
Again for 8 = 1/3, 1/2, and 2/3, we compare P^ mt (w, p^) with J opt (w) in Fig. [6l As shown in the 
figure, P^ mt (w, p( fc )) is asymptotically flat over (0,1) when 6 = 1/2, which is the case when is 
chosen properly. If 6 < 1/2, which means has too much weight around and 1, then the colluders' 
choice of g opt makes J op t peak at w = 1/2. If on the contrary 6 > 1/2, then too much weight around 
1/2 is put on ffo, and g opt makes J opt peaks at w = 0, 1. 

C. Why Are the Arc sine Distribution and the Interleaving Attack Optimal for Large Coalitions? 

The analysis and the numerical results above show not only that the asymptotic capacity is (k 2 2 In 2) _1 , 
but also that both the arcsine distribution for the maximizer and the interleaving attack for the minimizer 
achieve the same asymptotic value. Such results are very interesting, and at the same time raise some 
issues for further investigation. One topic concerns the regularity constraint (Condition ©upon which 



April 20, 2011 



DRAFT 



27 



the asymptotic analysis in Sec. IV-AI is based. However, it is reasonable to conjecture that the same 
asymptotics hold without the regularity condition. Our numerical results indeed suggest this condition 
may not be necessary. Moreover, it is important to mention that both the asymptotic lower bound on 
capacity (see 11231 Theorem 15]) and Corollary [6] (which contributes to the asymptotic upper bound) hold 
without the regularity constraint. 

Both the arcsine distribution and the interleaving attack have been extensively studied in the literature. In 
2003, Tardos applied the arcsine distribution to fingerprinting [9]. How he fine-tuned his codes, however, 
had been a mystery until Skoric et al. [12] and Furon et al. |[T9l rationalized Tardos' choices based on 
Gaussian approximations. On the other hand, the interleaving attack has been frequently adopted to model 
the collusion channel in the literature Q, lfl9l , 0, but no conceptual reasoning has been proposed on why 
it should be the coalition's optimal choice. Fortunately, owing to the discovery of the capacity formulas 
(Theorem [3), we can now study fingerprinting games from the information-theoretic point of view. In the 
previous subsection, we established the optimality of these two strategies based on asymptotic methods. 
Here we provide a statistical interpretation. 

We may think of the (joint) fingerprinting capacity game as follows: the coalition is given k independent 
observations X\,...,X}. distributed according to an unknown distribution Bernoulli(W / ) chosen at 
random by the fingerprinting embedder from the family { Bernoulli (W) : W G [0,1]} according to 
a known prior distribution Pyv- Upon generating Y according to the conditional distribution py\z given 
the sufficient statistic Z = Ym=i Xi* ^ e coalition suffers a loss I(Z; Y\W = w). The risk of the game 
I(Z;Y\W) is the average loss under Pyy. 

As emphasized in previous works |4), O, the choice of the embedding distribution Pyv, or prior 
selection in statistician's language, is crucial to the fingerprinting game. If no randomization takes place 
|[3l , or equivalently, if the realization w is revealed to the pirates (SJ, then the maximin game value decays 
exponentially with coalition size k (see (I3TI)). Loosely speaking, the loss the pirates suffer is mainly due 
to their error in estimating W. If they have a good estimation of the time-sharing random variable W, 
then the loss they suffer is small. 

Jeffreys' prior [27] is a "non-informative" prior that plays an important role in Bayesian statistics. Given 
a family of distributions with an unknown parameter, Jeffreys' prior is proportional to the square root 
of the Fisher information. Conceptually speaking, Jeffreys' prior is the "least-favorable" distribution in 
regard to estimating that parameter. For the Bernoulli trial with the probability of success w as parameter, 
the Fisher information is I(w) = [w(l — w)] _1 and thus Jeffreys' prior is 

f(w) oc 1 (98) 

y/W{l - W) 

which is exactly the arcsine distribution! 

The optimality of the interleaving attack is closely related to the concept of "equalizer rule" in game 
theory. From Corollary [6l interleaving is the asymptotic equalizing strategy, which is the desirable attribute 
we expect for a saddle -point solution. The optimal collusion channel depends on the prior by d9lT ), and 
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from the proof of Corollary [51 the interleaving attack is optimal under the arcsine distribution. Also observe 
that the interleaving attack is the strategy where the colluders generate Y according to Bernoulli (W), 
where W = Z/k is the maximum likelihood estimator of W, which is asymptotically unbiased (as 
k — > oo) and has minimum asymptotic variance (equal to (fcl(ui)) -1 = w(l — w)/k). 

Another interesting question is what are the asymptotics of the simple fingerprinting game. In Corollary 
UJ we established upper and lower bounds on c^ imple . Note that the upper bound is obtained by assuming 
the interleaving attack for the coalition and it coincides with the asymptotic rate (k 2 2 In 2) _1 of C 3 ^ m . On 
the other hand, Fig. HJb) indicates that the optimal attack is actually quite different from the interleaving 
attack. This suggests that the pirates can exploit the suboptimality of the single-user decoder and perform a 
stronger attack. The study of the exact asymptotics of the simple fingerprinting game, is left as future work. 

VI. Summary 

In this work, we proved new upper and lower bounds on the maximum achievable rate of binary 
fingerprinting codes for arbitrary coalition size by studying the minimax and the maximin fingerprinting 
games. We also provided asymptotic approximations of the capacity as well as both the fingerprinting 
embedder and the coalition's strategies. The results suggest that fingerprinting games under the Boneh- 
Shaw marking assumption have a close relation to the Fisher information and Jeffreys' prior for the 
Bernoulli model. 

Appendix 
Derivation of cf mple (^ w , p*) 

The function /^ implo (u>, p*) is indeed symmetric around w = 1/2 and has a global maximum at 
w = 1/2 as suggested by numerical experiments in J6). We first prove the following lemma: 

Lemma 8. For r > s > and r + s < 1, we have 

d{r || s) > d{s || r). (99) 

Proof: The difference 5(r,s) between the two sides is 

5(r, s) = d(r \\ s) — d(s \\ r) 

T 1 — T S 1 — S 

= r log — h (1 — r) log s log (1 — s) log 

s 1 — s r 1 — r 

= (r + s)log- + (2-r-s)log^— ^. (100) 
s 1 — s 

|_ i(r , s) = ^_ + l„ g !£-^ U oi) 

as s(l — sjlni s(l — r) 



Then 



and 



April 20, 2011 



DRAFT 



29 



for all r > s and r + s < 1. Now when s < r < 1/2, we have -3-5(r,s)\ _ = 5(r,r) = 0. Thus 



ds 

-§^5(r,s) < and thus 5(r,s) > 0. When 1/2 < r < 1-s, we have ^<5(r, s)| s=1 _ r < and S(r, 1- 
0. Hence similarly ^^(r, s) < and hence <5(r, s) > 0. To prove the inequality -§^S(r, s)| < 0, let 



■t) 



a 9 



1 — 2r r 
s=i-r r(l-r)]n2 1- 



and since A(l/2) = and A'(r) = - ^fe^^l ^ k follows that A ( r ) < for a11 r G t 1 / 2 . !]• ■ 
Now the payoff function can be written as 

i;; impl >,p*) { = } wd^'p* || a'p*) + (1 - w)d(a°'p* || a'p*) 

= wd(w H — || w) + (1 — w)d(w — — \\ w) (103) 

k A; 

where (a) follows from (l6Tb and (b) follows from (l54l ) and Lemma [3l and by which we can easily verify 
the symmetry property i^ imple (?x;, p*) = /f im P le (i _ Wj p*) Hence it suffices to show that /^ imple (zi;, p*) 
is nondecreasing for < w < 1/2. Taking the derivative of ( 11031 ) with respect to w and after some 
simplifications, we obtain 

^I™ ple (w,p*) = d{w + || uT) - d{w- || w + ) (104) 

where w+ = w + ±=p and = io - f . By Lemma i it follows that ^^ implc (-u;, p*) > for all 
w G [0, 1/2] and hence I^ imple (w, p*) achieves its maximum at w = 1/2. 
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